Implementations of the table data structure
Collision likelihoods and load factors for hash tables
A simple Hash Table in operation
Hash collisions occur when two different keys produce the same hash value. To handle these collisions, various techniques are employed. Let's explore three standard approaches: Buckets, Direct Chaining, and Open Addressing.
Imagine a scenario where we allocate a two-dimensional array and use each column as a bucket. Keys that produce the same hash value are placed in the same column. For example, let's say we have airports and their corresponding codes:
0 1 2 3 4 5 6 7 8 9 10 LAX PHL FRA GCM AKL ORY DCA HKG GLA
Here, keys like PHL and HKG fall into the same bucket. However, this method requires more space than necessary and becomes slower as the table fills up.
In this method, instead of reserving entire sub-arrays for collided keys, we create linked lists for each key. For instance:
Here, keys are stored in linked lists, which eliminates the need for reserving excessive space. However, finding a particular key requires traversing lists.
We can compute the average size of non-empty lists using probabilities. With 'n' items in an array of size 'm', the probability of no items landing in a slot is calculated. From this, we can derive the average number of items in a non-empty list. Linear search for an item in a list of size 'k' takes, on average, 'k + 1 / 2' comparisons.
This approach involves finding another open location for collided entries. Two common strategies are linear probing and double hashing.
Linear Probing: If a collision occurs, the algorithm searches for the next available slot by decrementing the index until an empty space is found.
Double Hashing: This method involves using a secondary hash function to find an empty location, which mitigates clustering issues associated with linear probing.
Space Efficiency: Direct chaining saves space by not reserving entire buckets but relies on linked lists, which can be memory-intensive.
Time Complexity: Open addressing provides constant-time complexity for insertion and search operations, making it efficient for large datasets.
Traversal: Direct chaining and buckets require traversal of linked lists or columns, impacting efficiency, whereas open addressing methods provide faster traversal.
Adaptability: Open addressing allows for dynamic resizing of the hash table to maintain optimal load factors, ensuring efficient performance over time.
In conclusion, each collision resolution technique has its advantages and drawbacks. Understanding the trade-offs between space efficiency, time complexity, and adaptability is crucial in selecting the appropriate approach based on the specific requirements of the application.
A hash code is a unique numeric value assigned to each key in a hash map. It acts as a fingerprint for the key, helping in quick identification and retrieval of associated data. Though not necessarily limited to a specific range, hash codes should aim to minimize collisions, where different keys produce the same hash code. Ensuring consistency, equal keys must yield the same hash code. Essentially, a hash code serves as a compact representation of a key, facilitating efficient storage and retrieval operations in hash-based data structures.